Wheeler graphs: A framework for BWT-based data structures☆

نویسندگان

  • Travis Gagie
  • Giovanni Manzini
  • Jouni Sirén
چکیده

The famous Burrows-Wheeler Transform (BWT) was originally defined for a single string but variations have been developed for sets of strings, labeled trees, de Bruijn graphs, etc. In this paper we propose a framework that includes many of these variations and that we hope will simplify the search for more. We first define Wheeler graphs and show they have a property we call path coherence. We show that if the state diagram of a finite-state automaton is a Wheeler graph then, by its path coherence, we can order the nodes such that, for any string, the nodes reachable from the initial state or states by processing that string are consecutive. This means that even if the automaton is non-deterministic, we can still store it compactly and process strings with it quickly. We then rederive several variations of the BWT by designing straightforward finite-state automata for the relevant problems and showing that their state diagrams are Wheeler graphs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DNA Sequence Compression Using the Burrows-Wheeler Transform

We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed...

متن کامل

A four-stage algorithm for updating a Burrows-Wheeler transform

We present a four-stage algorithm that updates the Burrows-Wheeler Transform of a text T , when this text is modified. The Burrows-Wheeler Transform is used by many text compression applications and some self-index data structures. It operates by reordering the letters of a text T to obtain a new text bwt(T ) which can be better compressed. Even if recent advances are offering this structure ne...

متن کامل

An Application of Self-organizing Data Structures to Compression

List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-bas...

متن کامل

Searching for Unique DNA Sequences with the Burrows-Wheeler Transform

The objective of this study was to present an efficient algorithm that effectively aids the problem of searching for unique DNA sequences in the set of genes. The presented algorithm is based on the Burrows-Wheeler Transform (BWT), a very fast and effective data compression algorithm. The developed algorithm exploits all the advantages offered by the BWT algorithm and the suffix array data stru...

متن کامل

High-performance BWT-based Encoders

In 1994, Burrows and Wheeler [5] developed a data compression algorithm which performs significantly better than Lempel-Ziv based algorithms. Since then, a lot of work has been done in order to improve their algorithm, which is based on a reversible transformation of the input string, called BWT (the Burrows-Wheeler transformation). In this paper, we propose a compression scheme based on BWT, M...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 698  شماره 

صفحات  -

تاریخ انتشار 2017